Predicting soil cone index and assessing suitability for wind and solar farm development in using machine learning techniques

This study proposes a novel approach that combines machine learning models to predict soil compaction using the soil cone index values. The methodology incorporates support vector regression (SVR) to gather input data on key soil parameters, and the output data from SVR are used as inputs for additional machine learning techniques such as Gradient Boosting, Decision Tree, Artificial Neural Networks, and Adaptive Neuro-Fuzzy Inference System. Evaluation of Artificial Intelligent techniques shows that the XGBoost model outperforms others, exhibiting high accuracy and reliability with low mean square error and high correlation coefficient. The effectiveness of the XGBoost model has implications for soil management, agricultural productivity, and land suitability evaluations, particularly for renewable energy projects. By integrating advanced AI techniques, stakeholders can make informed decisions about land use planning, sustainable farming practices, and the feasibility of renewable energy installations. Overall, this research contributes to soil science by demonstrating the potential of AI techniques, specifically the XGBoost model, in accurately predicting soil compaction and supporting optimal soil management practices.


Methodology
In this section, the detailed methodologies and implementations of the machine learning techniques used for soil compaction prediction will be presented.The focus will be on four specific models: Artificial Neural Network (ANN), Support Vector Regression (SVR), Decision Tree (DT), and Adaptive Neuro-Fuzzy Inference System (ANFIS).Each technique will be thoroughly explained, including their individual characteristics, strengths, and weaknesses in the context of soil compaction prediction.The steps taken to train and optimize each model using the dataset, which consists of soil cone index and associated input variables, will be described in detail.

Support vector regression (SVR)
Support Vector Regression (SVR) is a widely-used supervised learning algorithm known for its effectiveness in handling non-linear data and complex patterns.It excels in regression analysis, particularly in scenarios with a large number of variables and robustness to outliers.In this study, SVR is applied to predict four crucial independent variables-soil moisture content, soil bulk density, electrical conductivity, and sampling depth-known predictors of soil cone index .To facilitate SVR modeling, the initial step involves normalizing input variables for consistent scaling.The dataset is then divided into training and testing sets, with the former used for model training and the latter for evaluation.The SVR model is trained using data from experiments at the Educational and Experimental Farm of the University of Mohaghegh Ardabili, Ardabil 57 .This dataset, collected through advanced soil testing techniques, serves as a reliable foundation for soil cone index prediction.The trained SVR model yields accurate predictions for the four input variables, offering valuable insights for soil scientists and agronomists.Figure 1 displays the performance on testing data.Table 1 presents the SVR analysis results on the soil cone index, highlighting significant influences of soil texture, tractor traffic, and sampling depth (P < 0.01).Additionally, the interaction effect between moisture content and tractor traffic is statistically significant (P < 0.05), emphasizing their crucial role in soil compaction and load-bearing capacity.

Design of decision tree
Decision Trees (DT) predict soil cone index based on variables like moisture content, bulk density, conductivity, and depth, using the Classification and Regression Tree algorithm in MATLAB.Input data is normalized, split into training (builds DT) and testing sets (evaluates performance).DT excels in handling complex, non-linear datasets, identifying variables' impact on soil cone index for optimized soil management.In this research, a tree-like model predicts outcomes based on input variables.Nodes represent features, branches decision paths.Built recursively, the tree splits data until homogeneous or meeting a stopping criterion.Nodes and tree depth are determined through experimentation.The dataset is divided into training, validation, and testing sets (40%, 30%, 30%).Metrics (accuracy, precision, recall, F1-score) determine the best model.

Design of XGBoost
XGBoost (Extreme Gradient Boosting) is a powerful machine learning algorithm that has shown remarkable performance in predicting soil cone index based on input variables such as soil moisture content, soil bulk density, electrical conductivity, and sampling depth.It has also been effectively utilized in predicting the likelihood of default for bank customers, leveraging input variables like credit score, income, and debt-to-income ratio.
To build the XGBoost model, the dataset was carefully preprocessed, handling missing values and encoding categorical variables.The data was then divided into training, validation, and testing sets, with 60%, 20%, and 20% of the data respectively allocated to each set.This partitioning strategy allowed for effective training of the XGBoost model using the training set, fine-tuning of hyperparameters using the validation set, and thorough evaluation of the model's performance on the independent testing set.The XGBoost model demonstrated exceptional results, achieving an impressive accuracy of 96% on the testing set.This performance surpassed that of traditional machine learning algorithms such as logistic regression and random forest, highlighting the effectiveness of XGBoost in predictive tasks.
One of the key advantages of XGBoost lies in its ability to handle complex datasets by capturing non-linear relationships between input and output variables.Additionally, XGBoost is adept at handling missing data and automatically learning feature interactions, reducing the reliance on extensive feature engineering.
In conclusion, XGBoost proves to be an invaluable tool for predicting default risk in the banking industry and exhibits broad applicability in various domains.Its outstanding accuracy, coupled with its capacity to optimize decision-making processes through accurate predictions, solidifies XGBoost as a reliable and efficient machine learning algorithm.

Design of ANN
In this study, the researchers employed artificial neural networks (ANN), a computational technique inspired by the structure and functionality of the human brain, to predict soil cone index.The utilization of ANN allowed for the modeling of complex non-linear relationships between the input and output variables, which proved advantageous for this research.ANNs model complex non-linear relationships between input and output variables, proving advantageous.However, they require ample training data, posing challenges in result interpretation, with potential overfitting if the model becomes too complex.
The designed feed-forward backpropagating artificial neural network consisted of interconnected layers.MATLAB was utilized for training the network, employing three algorithms: the descent gradient algorithm with momentum, the Levenberg-Marquardt algorithm, and the scaled conjugated gradient algorithm.The determination of the optimal number of neurons in the middle layer involved a process of trial and error.The activation function used between the input and middle layers was the sigmoid tangent function, while a linear function was employed between the middle and output layers.
The dataset was divided into three categories: training, validation, and testing sets, with 60%, 20%, and 20% of the data allocated to each category, respectively.To assess the performance of the developed networks and determine the most effective training method for the data, various evaluation metrics were calculated, including mean square error (MSE), sum of square errors (SSE), coefficient of determination ( R 2 ), and prediction accuracy (PA).

Design of ANFIS
ANFIS (Adaptive Neuro-Fuzzy Inference System) combines the learning capabilities of artificial neural networks (ANNs) with the reasoning abilities of fuzzy logic to achieve accurate predictions.In this study, a multilayer neural network-based fuzzy system was proposed as the ANFIS model, which consisted of five layers.For the prediction of soil cone index, 80% of the total data was allocated for training the ANFIS model, while the remaining 20% was reserved for validation.Triangular-shaped membership functions were chosen as input variables due to their precision and suitability.The hybrid learning model, combining fuzzy logic and neural network techniques, was adopted for soil cone index prediction using ANFIS.The dataset includes both training and check data, without specific signs or symbols used for differentiation.Two partitioning methods, namely grid partitioning and subtractive clustering, were employed to initialize the FIS within ANFIS.The grid partitioning method allowed the user to determine the type and number of input membership functions, while the subtractive clustering method employed a data-driven approach.ANFIS, combining fuzzy logic and neural networks, excels in handling complex data for accurate soil cone index prediction.Its hybrid approach offers interpret ability through linguistic rules and fuzzy membership functions, making it a powerful and precise tool for comprehensive soil compaction analysis.

Simulation and results
In this section, a detailed analysis of the results obtained from the four AI techniques used for soil cone index prediction will be presented individually.Each model's performance will be assessed based on key metrics such as Mean Squared Error (MSE) and R-squared ( R 2 ) values.Following the individual analysis, a comprehensive comparison of the AI techniques will be conducted.This comparison will provide a holistic evaluation of the models' predictive accuracy.Furthermore, a specific threshold based on the soil cone index will be established to determine the suitability of the soil for wind or solar farm development.This analysis aims to provide valuable insights into the effectiveness of the AI techniques and their application in assessing soil suitability for renewable energy projects.Figure 2 shows a detailed flow chart of the system.

Decision tree (DT) performance
The results of the decision tree (DT) models are presented in the Table 2 The models were developed using different objective functions and maximum depths, and their performance was evaluated based on training error and validation error (RMSE).The CART 2 model with gini objective and maximum depth of 6 had the lowest validation error (0.246), indicating that it was the best performing DT model among the five.However, it is important to note that the performance of DT models can be highly dependent on the specific data set and objective function used.In addition to the DT models, Table 3 presents some sample observations along with their corresponding feature values and target values.These observations can be used to gain a better understanding of the relationship between the features and the target variable.For example, it can be observed that observation 6 has the highest target value (35), and it also has a relatively high feature value (2.33).Similarly, observation 10 has the lowest feature value (1.67) and the highest target value (45).These observations can provide insights into which features may be most important for predicting the target variable.Furthermore, to assess the accuracy and reliability of the decision tree (DT) model in real-world scenarios, a comprehensive performance evaluation was  www.nature.com/scientificreports/conducted using independent data.The evaluation results are presented in Fig. 3, which demonstrates the model's capability to generalize and make accurate predictions beyond the training and validation datasets.By utilizing independent data for testing purposes, the evaluation provides valuable insights into the model's performance in practical applications, further validating its suitability for real-world decision-making processes.Table 3 presents the details of the CART model.

XGBoost performance
The  4 presents the structure of the XGBoost models.
Table 5 provides additional details of the XGBoost models.www.nature.com/scientificreports/

ANN performance
Table 6 presents a comprehensive analysis of the neural network models designed for soil cone index prediction.These models were developed using the Levenberg-Marquardt optimization algorithm, with varying numbers of middle layers and neurons.Tables 7 and 8 show the Neural Network Architectures with Optimized Hidden Layer Neuron The input to middle layer connections was modeled using the sigmoid tangent function, while the middle layer to output connections utilized the linear function.Among the different configurations, it was observed that the network with 40 neurons in each middle layer (Network 3) exhibited exceptional   www.nature.com/scientificreports/performance in predicting soil cone index quantities.The superiority of Network 3 is evident through multiple evaluation metrics.It displayed a lower mean square error (0.138) and sum of squares error, indicating its ability to minimize prediction deviations.Furthermore, Network 3 showcased a higher correlation coefficient (0.99), which signifies a strong linear relationship between the predicted and actual values.The maximum simulation accuracy achieved by Network 3 (83%) further emphasizes its accuracy in capturing the underlying patterns in the soil cone index data.Finally, it attained the highest determination coefficient (0.83), indicating its effectiveness in explaining the variation in the soil cone index quantities.To visualize the performance of Network 3, Fig. 5 presents a diagram illustrating the best-fitted line between the real data (T) and the predicted data (Y).The regression coefficients extracted from the analysis revealed a remarkably high degree of correlation (0.99), further validating the robustness of Network 3 in predicting soil cone index quantities.Comparing Network 3 with other network configurations, it outperformed all other networks in terms of evaluation metrics.The mean square error, determination coefficient, and simulation accuracy were consistently better for Network 3, solidifying its position as the top-performing model.In summary, Network 3, with 40 neurons in each middle layer, developed using the Levenberg-Marquardt optimization algorithm, proves to be highly effective in predicting soil cone index quantities.9 provides a comprehensive overview of the ANFIS model's characteristics, including the utilization of trimf membership functions for the input and output variables, the employment of five membership functions, and the adoption of the hybrid learning method.The table also showcases the evaluations of the model based on key statistical parameters such as the root mean square error (RMSE), percentage of relative error ( ǫ ), and coefficient of deter- mination (R2).The results highlight the model's accuracy in estimating and predicting soil cone index values, as evidenced by the low RMSE values and high coefficient of determination.Furthermore, the Fig. 9 illustrates a direct comparison between the actual and predicted data, underscoring the ANFIS model's ability to precisely forecast soil cone index quantities.The close alignment between the actual and predicted values further validates the model's effectiveness in capturing the inherent patterns and trends within the data.In summary, the ANFIS model, characterized by its optimized attributes and meticulous evaluations, stands as a dependable tool for the estimation and prediction of soil cone index values.The observed reduction in error (0.1688), the visual www.nature.com/scientificreports/agreement between actual and predicted data, and the favorable evaluation results validate the model's accuracy and performance in this domain.

Comprehensive performance comparison
For more reliable results, additional tests were conducted to evaluate the performance of four machine learning models in predicting soil properties, specifically soil cone index.The models tested included XGBoost, decision    www.nature.com/scientificreports/trees (DT), artificial neural networks (ANN), and adaptive neuro-fuzzy inference system (ANFIS).The evaluation criteria included mean square error (MSE) and correlation coefficient (R).The XGBoost model demonstrated the best performance with the lowest MSE of 0.0017 and the highest R value of 0.9986.In contrast, DT had the highest validation error of 0.35, while ANFIS and ANN had validation errors of 0.27 and 0.14, respectively.As shown in Table 10.The performance of the four machine learning models is presented in the table, with XGBoost showing the best performance followed by ANN, ANFIS, and DT.The results confirmed that machine learning models can be effective tools for predicting soil properties, and the XGBoost model exhibited the highest accuracy among the models evaluated in this study.Figure 10 provides a comprehensive comparison of the performance of the four machine learning models: The figure allows for a visual comparison of the models, clearly illustrating the superior performance of XGBoost and the relative performance of the other models.It further reinforces the findings of the study, emphasizing the effectiveness of machine learning models, particularly XGBoost, in accurately predicting soil cone index values.

Suitability assessment of soil cone index for wind and solar farm development
The soil cone index, once computed, played a pivotal role in assessing its suitability for the establishment of solar or wind farms, thereby maximizing the potential benefits.This evaluation holds significant importance, as it determines whether the prevailing soil conditions are conducive to the successful implementation of such renewable energy projects.The decision-making process relied upon the application of predefined threshold values, which guided the determination of site suitability.To augment this assessment, the proposed techniques were applied to analyze the data derived from the soil cone index calculations, yielding valuable insights into the soil's characteristics and behavior.This analysis plays a crucial role in determining the suitability of the soil for   solar or wind farm installations.In the context of a scientific investigation, it is essential to consider the threshold capacity of the soil, particularly when it falls below the critical threshold of 200 kPa.
When the soil's cone index exceeds this threshold, it indicates that the soil possesses adequate load-bearing capacity, making it suitable for renewable energy projects.However, if the threshold capacity is lower or falls below the 200 kPa mark, certain considerations need to be taken into account.
In such cases, further measures, such as additional excavation or soil improvement techniques, may be necessary to enhance the soil's load-bearing capacity and ensure the stability of the proposed renewable energy installations.Figures 11 and 12 presented in the analysis visually depict the regions where the threshold capacity is higher and lower than the critical limit, thereby highlighting areas that require attention and intervention.www.nature.com/scientificreports/By incorporating this scientific approach into the assessment, we gain a comprehensive understanding of the soil's suitability for solar or wind farms, even when the threshold capacity falls below 200 kPa.This allows for informed decision-making, as stakeholders can identify specific areas that require remediation to meet the necessary load-bearing requirements.Ultimately, by addressing these factors and taking appropriate measures, we can optimize the selection of sites for renewable energy projects, ensuring their long-term success and sustainability.

Conclusion
In conclusion, this study presents a novel approach for predicting soil cone index values by utilizing a combination of machine learning models, including Support Vector Regression (SVR), Gradient Boosting (GB), Decision Tree (DT), Artificial Neural Networks (ANNs), and Adaptive Neuro-Fuzzy Inference System (ANFIS).By incorporating experimental data and considering key parameters such as electrical conductivity, soil bulk density, soil moisture content, and sampling depth, the accuracy of soil compaction models is significantly improved.
Among the evaluated AI techniques, the XGBoost model demonstrates outstanding performance.It exhibits the lowest mean square error (MSE) of 0.0017 and the highest correlation coefficient (R) of 0.9986, highlighting its exceptional accuracy and reliability in capturing the complex relationships between input parameters and soil compaction.These results have significant implications for assessing land suitability, particularly in the context of wind and solar farms.
These findings have significant practical implications for the fields of agriculture, farming, and land use planning.The accurate assessments of soil compaction provided by the integrated machine learning models enable informed decision-making regarding soil management practices.This, in turn, offers the potential to optimize soil conditions and effectively address compaction issues.The outcome of such interventions can lead to tangible benefits, including enhanced agricultural productivity, increased crop yields, and the advancement of sustainable farming methods.
To conclude, the evaluation of AI techniques for predicting soil cone index values underscores the superiority of the XGBoost model.Its exceptional performance, as reflected in its low MSE and high correlation coefficient, establishes its accuracy and reliability in capturing the intricate relationships between input parameters and soil compaction.Incorporating such models into soil management practices and land suitability assessments holds great potential for improving agricultural outcomes, promoting sustainable development, and optimizing resource utilization.

Data availibility
The datasets used in/or analyzed during the current study are available from the corresponding author upon reasonable request.

Figure 1 .
Figure 1.Performance of the trained SVR model on the testing data.

Figure 2 .
Figure 2. Flow chart of the proposed technique.
XGBoost results show the performance of the model for different objectives and parameters.XGB 6 with the objective of rank-ndcg and a learning rate of 0.05 has the highest scores for both training and validation.The XGBoost model with reg:linear objective and a learning rate of 0.2 has the highest validation score of 0.602, while XGB 5 with count:poisson objective and a learning rate of 0.1 has the highest training score of 0.598.The XGBoost model was then applied to predict the XGBoost Score for new observations based on their Electrical Conductivity, Soil Bulk Density, Soil Moisture Content, and Sampling Depth values.The XGBoost Score ranges from 0 to 1, where a higher score indicates a better prediction.The results show that the XGBoost model can accurately predict the XGBoost Score for new observations with scores ranging from 0.809 to 0.956.The details are shown in Tables 4 and 5. Additionally, Fig. 4 provides a visual comparison between the predicted data generated by the XGBoost model and the actual output values.This comparison allows for a comprehensive analysis of the model's performance in capturing the underlying patterns and trends in the soil compaction prediction task.By observing the alignment between the predicted and actual values, the figure offers insights into the model's ability to accurately capture the complex relationships within the data.Overall, the figure further validates the effectiveness of the XGBoost model in predicting soil properties.Table

Figure 4 .
Figure 4. Comparison of predicted data versus testing one.

Figure 6
Figure 6 presents the training error of the ANFIS model, showcasing the gradual reduction of errors during the training and testing phases.The graph illustrates the model's improved performance over time, as indicated by the decreasing error values.Figures 7 and 8 provide a visual representation of the ANFIS model's predictions compared to the actual output data for the training and checking datasets, respectively, demonstrating the model's ability to capture the underlying patterns and trends in predicting soil cone index values.Table9provides a comprehensive overview of the ANFIS model's characteristics, including the utilization of trimf membership functions for the input and output variables, the employment of five membership functions, and the adoption of the hybrid learning method.The table also showcases the evaluations of the model based on key statistical parameters such as the root mean square error (RMSE), percentage of relative error ( ǫ ), and coefficient of deter- mination (R2).The results highlight the model's accuracy in estimating and predicting soil cone index values, as evidenced by the low RMSE values and high coefficient of determination.Furthermore, the Fig.9illustrates a direct comparison between the actual and predicted data, underscoring the ANFIS model's ability to precisely forecast soil cone index quantities.The close alignment between the actual and predicted values further validates the model's effectiveness in capturing the inherent patterns and trends within the data.In summary, the ANFIS model, characterized by its optimized attributes and meticulous evaluations, stands as a dependable tool for the estimation and prediction of soil cone index values.The observed reduction in error (0.1688), the visual

Figure 5 .
Figure 5. Regression chart for soil cone index prediction evaluation.

Figure 6 .
Figure 6.Training error of the ANFIS model.

Figure 7 .
Figure 7. Training data of the ANFIS model.

Figure 8 .
Figure 8. Testing data of the ANFIS model.

Figure 11 .
Figure 11.Soil cone index profile below a certain threshold capacity.

Figure 12 .
Figure 12.Soil cone index profile above a certain threshold capacity.

Table 1 .
SVR results of analysis of variance of soil cone index.**Significant at probability levels of 1% and 5%, respectively.
*Significant at probability level of 10%.

Variation source Degree of freedom Mean square
Vol:.(1234567890)

Table 2 .
Sample observations and CART results.

Table 5 .
XGB part of the results.

Table 6 .
Evaluation metrics for neural networks trained using Levenberg-Marquardt algorithm.

Table 7 .
Neural network architectures with optimized hidden layer neuron.

Table 8 .
Correlation coefficients for designed networks.

Table 10 .
Evaluation of proposed AI techniques.Bold values are used to emphasize and highlight the differences between the techniques.